Aller au contenu principal

Data Warehouse

What is a Data Warehouse

A data warehouse is a centralized repository that stores structured, semi-structured data from multiple sources in a predefined schema, specifically designed for data analysis, BI, and reporting.

It contains historical data that has been cleaned, transformed, and organized according to a specific business model or requirement.

Data must be processed and transformed (ETL - Extract, Transform, Load) or ELT (Extract, Load, Transform) before being stored in the warehouse.

In general, data warehouses are optimized for Gigabytes to Petabytes of data.

 

image

Why use a Data Warehouse

  • Optimized for Analysis: Data is structured and organized for optimal query performance and business intelligence.
  • Data Consistency: Enforces data quality and provides a single source of truth for business reporting.
  • Historical Analysis: Maintains historical data for trend analysis and business intelligence.
  • Business Focus: Designed specifically for business users and analysts to access standardized data for reporting and decision-making.

Challenges of Data Warehouses

  • Schema Rigidity: Predefined schema makes it difficult to adapt to new data types or changing business requirements.
  • Cost: High setup and maintenance costs due to specialized hardware and software requirements.
  • Time to Value: Implementing ETL processes and maintaining data models can be time-consuming.
  • Limited Data Types: Primarily designed for structured data, making it difficult to handle unstructured or semi-structured data formats.

OLAP Systems

Data warehouses are built on OLAP (Online Analytical Processing) systems, which differ significantly from OLTP (Online Transaction Processing) systems used in operational databases.

image

Key Differences

CharacteristicOLTPOLAP
PurposeDaily transactions processingData analysis and reporting
Data ModelNormalizedDenormalized
Data UpdatesContinuous updatesPeriodic batch updates
Query TypeSimple transactions, few recordsComplex queries, many records
Response TimeMillisecondsSeconds to minutes
Space RequirementsGigabytesTerabytes to Petabytes
Data HistoryCurrent data (days/months)Historical data (years)
OptimizationFor insert/updateFor complex queries